Sixth International Joint Conference on Natural Language Processing Proceedings of the Fourth Workshop on South and Southeast Asian Natural Language Processing
ثبت نشده
چکیده
This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words.
منابع مشابه
Sixth International Joint Conference on Natural Language Processing Proceedings of the 11th Workshop on Asian Language Resources
Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...
متن کاملSixth International Joint Conference on Natural Language Processing Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing
In this talk, we are going to give a systematic view of lexical semantics of Chinese language. From macro perspective point of view, lexical conceptual meanings are classified into hierarchical semantic types and each type plays some particular semantic functions of Host, Attribute, and Value to form a semantic compositional system. Lexical senses and their compositional functions will be exemp...
متن کاملSixth International Joint Conference on Natural Language Processing The First Workshop on Natural Language Processing for Medical and Healthcare Fields
This paper describes a method to extract medical information from texts. The method targets to extract complaints and diagnoses from electronic health record texts. Complaints and diagnoses are fundamental information and can be used for more complex medical tasks. The method utilizes several medical knowledge resources to enhance the performance of extraction. With an evaluation using NTCIR10 ...
متن کاملSixth International Joint Conference on Natural Language Processing Proceedings of the Workshop on Natural Language Processing for Social Media (SocialNLP)
In Taiwan, there are different types of TV programs, and each program usually has its broadcast length and frequency. We accumulate the broadcasted TV programs’ word-ofmouth on Facebook and apply the Backpropagation Network to predict the latest program audience rating. TV audience rating is an important indicator regarding the popularity of programs and it is also a factor to influence the rev...
متن کاملNote from the Editor: Special issue on speech processing and soft computing
This special issue of the Journal is devoted to the work of twelve eminent speech scientists who apply novel soft computing methods to address some of the most difficult and persistent problems facing speech recognition systems today. I first heard these scientists discuss their innovative soft computing algorithms at the University of Salamanca where the Sixth International Conference on Soft ...
متن کامل